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DIVERSITY AND PHYLOGENETIC CLASSIFICATION: 
A REFLECTION ON METHODS OF ANALYSIS 
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ABSTRACT 


Graf and Cummings (2006), hereafter referred to simply as “G & C”, provided phylogenetic 
analyses of a three-partition data set in order to (1) examine the higher level evolutionary rela- 
tionships within the Palaeoheterodonta, (2) estimate the history of character state change, and 
(3) develop a phylogenetic classification for the group. However, portions of the available CO/ 
DNA sequence data, for multiple terminals, were omitted from G & C’s phylogenetic analyses, 
and no attempt was made to explicitly account for the documented saturation in the CO/ data 

_ partition. In order to evaluate the effects of these omissions, we performed Bayesian inference 
(BI) as well as maximum parsimony (MP) analyses on G & C’s combined evidence (CE) matrix 
that included all of the ingroup CO/ sequences contained in G & C, plus the omitted outgroup 
COI sequences. We conclude that G & C’s COI DNA sequence omissions, when combined 
with MP analyses not accounting for CO/ saturation, negatively affected the topologies of the 
best trees obtained from phylogenetic analyses of the CE matrix. This conclusion questions 
the utility of G & C’s inferences regarding palaeoheterodont bivalve character evolution as 
well as the taxonomic classification drawn from its preferred topology. For example, counter 
to G & C’s inferences, our BI and “transformed COF” MP analyses determined that unionoid 
oyster conchology has evolved multiple times, and all of our phylogenetic analyses indicate 
that the Etheriidae (sensu G & C) is not monophyletic. However, it should be noted that, to 
date, no phylogenetic analysis of this data set has robustly estimated all basal nodes within the 
Unionoida. Therefore, any inferences regarding unionoid bivalve character evolution, diversity 
and classification drawn from these topologies should be considered weakly supported. 

Key words: Palaeoheterodonta, Unionoida, Bayesian phylogenetics, maximum parsimony, 
phylogenetic classification, COI, 28S, morphology. 


INTRODUCTION 


Graf & Cummings (2006), hereafter referred 
to simply as “G & C’, provided phylogenetic 
analyses of a three-partition (partial CO/, partial 
28S and morphology/life history characters), 
combined evidence (CE) data set in order to 
(1) examine the higher level evolutionary re- 
lationships within the Palaeoheterodonta, (2) 
estimate the history of character state change, 
and (3) develop a phylogenetic classification for 
the group. Given the unsettled nature of rela- 
tionships within the constituent order Unionoida 
(= freshwater mussels; reviews in Roe & Hoeh, 
2003; Walker et al., 2006; Figs. 1, 2, herein), 
we applaud this effort. 


But, given G & C’s acceptance of a combined 
evidence philosophy, we find it curious that G & 
C did not include the available, homologous CO/ 
sequences (actually FCO/ mitochondrial DNA 
[mtDNA] sequences; see Breton et al., 2007, 
for a recent review of doubly uniparental inheri- 
tance of mtDNA) for the two outgroup terminals, 
Mytilus and Astarte, in any of its phylogenetic 
analyses. Additionally, G & C did not include the 
available, homologous CO/ sequences for three 
ingroup terminals, Coelatura, Pseudomulleria 
and Obliquaria in the analysis that produced 
its preferred tree (G & C: fig. 3; from which the 
higher-taxonomic relationships are portrayed in 
G & C: fig. 4 and Fig. 1, herein). G & C’s omis- 
sion of these ingroup CO/ sequences resulted in 
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FIG. 1. Summary tree showing the higher level relationships within the order Unionoida 
as portrayed in G & C (2006: fig. 4). Familial and superfamilial nomenclature: from G & C 
(2006). 


the removal of all of the available DNA characters 
for two of these three terminals. 

We also found curious G & C’s position re- 
garding the proper method of analysis for CO/ 
DNA sequences. Despite previous publications’ 
documentation of CO/ DNA sequence satura- 
tion (e.g., Hoeh et al., 1998; Graf & O Foighil, 
2000) at this time depth (> 200 my; Watters, 
2001), G & C’s phylogenetic analyses did not 
encompass methods designed to compensate 
for the acknowledged saturation in the CO/ DNA 
sequences. G & C gave no explanation for this 
apparent incongruity. 

Thus, in order to evaluate the effects of G & 
C’s (1) outgroup and ingroup CO/ DNAsequence 
omissions and (2) analysis of CO/ DNA se- 
quences without explicitly addressing the issue 
of saturation on inferences regarding palaeo- 
heterodont bivalve evolutionary relationships, 
we performed Bayesian inference (Bl) as well as 


maximum parsimony (MP) analyses on G & C’s 
CE matrix that included all of the data contained 
in G &C plus Mytilus and Astarte COI DNA se- 
quences. The MP analyses were performed on 
CE matrices containing both non-transformed 
and transformed (= 3rd codon position transitions 
deleted) CO/ sequences. Both the BI and “trans- 
formed CO/” MP analyses represent attempts to 
compensate for the saturation contained within 
the CO/ DNA sequence data partition. We con- 
clude that G & C’s CO/ DNAsequence deletions, 
when combined with MP analyses without “cor- 
rections” for CO/ saturation, negatively affected 
the topologies of the best trees obtained from 
phylogenetic analyses of the CE matrix. This 
conclusion questions the utility of the inferences 
regarding palaeoheterodont bivalve character 
evolution as well as the taxonomic classification 
drawn from G & C’s preferred topology (G & C: 
figs. 3, 4; Fig. 1 herein). 
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FIG. 2. Summary tree after the evolutionary relationships within the order Unionoida 
as portrayed in Bogan & Hoeh (2000: fig. 1) combined with those in Hoeh et al. (2001: 
fig. 14.4). Familial nomenclature: from Bogan & Hoeh (2000); subordinal nomenclature: 


proposed here. 
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MATERIALS & METHODS 


We obtained G & C’s aligned dataset from 
the following website: http://www.mussel- 
project.net/. GenBank accession numbers 
for all of the sequences analyzed herein are 
presented in G & C (Table 2) except for the CO/ 
sequences of Mytilus edulis (Hoeh et al., 1996) 
and Astarte castanea (GenBank accession 
number AF120662). To align the Mytilus and 
Astarte COI sequences with those in G & C’s 
matrix, all CO/ DNAsequences were translated 
to amino acids, using the Drosophila mtDNA 
genetic code, and aligned with CLUSTAL_X 
(Larkin et al., 2007). Subsequently, the amino 
acid alignment was used to align the nucleotide 
sequences. The three-partition matrices utilized 
in our phylogenetic analyses, outlined below, 
are available upon request. 

Phylogenetic trees were estimated using 
Bayesian inference (BI) and maximum parsi- 
mony (MP) approaches. Bayesian analyses 
were conducted using the program MrBayes 
(version 3.1.2; Huelsenbeck & Ronquist, 2001; 
Ronquist & Huelsenbeck, 2003). Bayesian 
searches were run for 10 million generations 
with ten search chains, and the data were di- 
vided into five partitions (CO/: 1st, 2nd and 3rd 
codon positions; 28S; morphology/life history), 
saving 10,000 trees (one tree every 1,000 gen- 
erations). These searches utilized the GTR + G 
+ | substitution model (Rodriguez et al., 1990) 
for the DNA partitions and the standard discrete 
model (Lewis, 2001a; Ronquist et al., 2005) 
for the morphology/life history data partition. 
To allow each partition to have its own set of 
parameter estimates, revmat, tratio, statefreg, 
shape, and pinvar were all unlinked during the 
analysis. Burn-in was determined by visual 
inspection of the likelihood score plots obtained 
as the trees were written to the tree file. In the 
Bayesian analyses, stationarity was reached 
before one million generations, and the first 
1,000 trees were discarded (i.e., the first million 
generations) as the burn-in. 

Maximum parsimony searches were done 
with PAUP* (v.4.0b10; Swofford, 2002). Heu- 
ristic searches of the CE dataset were con- 
ducted with 10,000 random stepwise addition 
sequences with TBR branch-swapping, max 
trees set at 20,000, gaps handled as missing 
data, and MULTREES in effect. Additionally, 
10,000 full heuristic non-parametric bootstrap 
replicates were done to assess levels of nodal 
support. All phylogenetic analyses were done 
on CE datasets that included the Mytilus and 
Astarte COI sequences and MP analyses were 


carried out on CE matrices with transformed (= 
3rd codon position transitions were deleted) and 
non-transformed CO/ sequences. All MP analy- 
ses used equal weighting for each character in 
the CE matrix and all BI analyses used the CE 
matrix with non-transformed CO/ sequences. All 
of the phylogenetic analyses presented herein 
included the CO/ sequences from Pseudomul- 
leria, Coelatura and Obliquaria. 

The optimization of cemented (= oyster-like) 
vs. non-cemented shell morphology was based 
on the Bayesian topology with the highest 
overall posterior probability and was carried 
out using the ML algorithm in Mesquite (v.2.6; 
Maddison & Maddison, 2008). Both the Markov 
k-state one parameter model (MK1 model in- 
vokes equal forward and backward transition 
rates; Lewis, 2001a) and the “Asymmetrical 
Markov k-state 2 parameter model” (Asym- 
mMK model in which “forward” and “backward” 
transition rates can be different; Pagel, 1997; 
Mooers & Schluter, 1999; Maddison, 2006) 
were explored for differences in optimization 
results and overall likelihood score. Because 
the two models are nested with the AsymmMK 
model having one more parameter than the 
MK1 model, a likelihood ratio test following a 
chi-squared distribution (df = 1) can be used 
to determine whether one model fits the data 
significantly better than the other (Pagel, 1994; 
see the Mesquite manual). The optimizations 
incorporated branch length and parameter es- 
timates from the Bayesian analyses. To make 
decisions regarding the significance of ances- 
tral character states, ancestral character state 
estimates with a log likelinood two or more units 
lower than the best state estimate (decision 
threshold [T] set to T = 2) were rejected (Ed- 
wards, 1972; Pagel, 1999). Generally viewed 
as a conservative cutoff, this threshold has 
been used by numerous recent authors (e.g., 
Moczek et al., 2006; Fernandez & Morris, 2007; 
Murphy et al., 2007; Koepfli et al., 2008). 


RESULTS 


Figures 3—5 contain the best (BI) and strict 
consensus (MP) trees produced by our analy- 
ses of the G & C CE dataset. Figure 3 portrays 
the best BI tree (= tree with the highest overall 
posterior probability) with parsimony-estimated 
branch lengths. Figure 4 portrays the strict 
consensus MP tree that resulted from analysis 
of the CE matrix containing transformed CO/ 
sequences and Figure 5 portrays the strict 
consensus MP tree that resulted from analysis 
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of the CE matrix containing non-transformed 
COI sequences. Significant nodal support is in- 
dicated by posterior probabilities = 0.95 (Fig. 3) 
and bootstrap percentages 2 70 (Figs. 4, 5). 
Drastic topological effects were observed 
among the trees resulting from distinct analytical 
procedures (e.g., Figs. 3, 4 [from techniques that 


* 
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utilize corrections for CO/ sequence saturation] 
vs. Fig. 5 [no correction for saturation]). For ex- 
ample, Figures 3 and 4 display the Hyriidae as 
the basal unionoid bivalve lineage while Figure 
5 indicates that that family is sister to the Etheri- 
oidea (sensu Bogan & Hoeh, 2000; Hoeh et al., 
2001: Iridinidae + Mycetopodidae + Acostaea 
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FIG. 3. Best tree obtained from 10 million generation Bayesian searches of the CE dataset with 10 search 
chains and five data partitions under the GTR + G + | model. Posterior probabilities 2 0.95 (from the BI ma- 
jority-rule tree) are indicated by asterisks. Familial and superfamilial nomenclature: from G & C (2006). 
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+ Etheria) + Pseudomulleria. Furthermore, the 
MP phylogenetic analysis utilizing the CE matrix 
with non-transformed CO/ sequences produced 
a consensus tree with fewer clades possessing 
significant nodal support values (Fig. 5) com- 
pared to the trees generated using techniques 
that compensate for CO/ sequence saturation 
(Figs. 3, 4). For example, the Hyriidae was con- 
sistently found statistically monophyletic in the 
BI (Fig. 3) and “transformed COP CE MP (Fig. 4) 


analyses, but this was not the case in the “non- 
transformed COP CE MP analysis (Fig. 5). 
The family Etheriidae (sensu G & C: unionoid 
bivalve oysters) is not a monophyletic taxon in 
any of the three trees generated herein (Figs. 
3—5). The best BI tree (Fig. 3) clearly indicates, 
via topology. and significant nodal support 
values, that Pseudomulleria is a unionid and 
that Acostaea and Etheria are members of 
a monophyletic Etherioidea (sensu Bogan & 
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FIG. 4. Strict consensus of all equally parsimonious trees obtained by maximum parsimony analyses 
of the transformed CE dataset from 10,000 heuristic, random stepwise addition sequence replicates 
utilizing TBR branch swapping and ‘max trees’ set to 20,000. Bootstrap support values = 70% (from 
10,000 full heuristic bootstrap replicates) are indicated by asterisks. Familial and superfamilial nomen- 


clature: from G & C (2006). 


oe HOEH ET AL. 


Hoeh, 2000; Hoeh et al., 2001). The Etheri- 
oidea (sensu Bogan & Hoeh, 2000; Hoeh etal., 
2001) was also statistically supported as mono- 
phyletic by the analyses of the “transformed 
COPF CE matrix (Fig. 4) and “non-transformed 
COPF CE matrix (Fig. 5). In addition, there was 
a significant divergence among phylogenetic 
techniques regarding the evolutionary real- 
ity of the largest unionoid bivalve family: the 
Unionidae (including Pseudomulleria) was 
found monophyletic (Fig. 3) and, alternatively, 


non-monophyletic (Figs. 4, 5). However, only 
the monophyly indicated in Figure 3 was sup- 
ported by a significant nodal support value. 
AML optimization of cemented (= oyster-like) 
vs. non-cemented shell morphology, using the 
AsymmMK model of character evolution, is 
presented in Figure 6. Although the AsymmMK 
model had a slightly higher log-likelinood score, 
the likelihood ratio test showed that this model 
did not fit the data significantly better than 
the MK1 model (equal forward and backward 
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FIG. 5. Strict consensus of all equally parsimonious trees obtained by maximum parsimony analyses 
of the non-transformed CE dataset from 10,000 heuristic random stepwise addition sequence repli- 
cates utilizing TBR branch swapping and ‘max trees’ set to 20,000. Bootstrap support values 2 70% 
(from 10,000 full heuristic bootstrap replicates) are indicated by asterisks. Familial and superfamilial 


nomenclature: from G & C (2006). 
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Fig. 6. Maximum likelihood optimization of cemented vs. non-cemented shell morphology, on the tree in 
Figure 3, produced by Mesquite using the AsymmMK model. Significance of ancestral character state 
estimates determined by one character state having a log likelihood two or more units higher than the 
other state. All nodes are significant for a single character state except for the single node denoted 
with an asterisk (*). 
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transition rates). However, optimizing with either 
model resulted in the same inference: three 
independent transitions from non-cemented to 
cemented morphology. Using the AsymmMK 
model, all of the ancestral character state estima- 
tions were significant except for the root node (as 
indicated by an asterisk on Fig. 6), whereas all 
nodes were significant under the MK1 model. 
The BI analysis of G & C’s CE matrix utilized 
the most sophisticated methodologies for dealing 
with inter- and intra-partition rate heterogeneity 
and produced the most resolved trees (based 
on comparisons of tree topologies). The tree 
produced by BI analysis (Fig. 3) indicates signifi- 
cant support for the monophyly of the Unionidae 
(including Pseudomulleria), Margaritiferidae, 
Hyriidae, Etherioidea (excluding both Pseudo- 
mulleria and the Hyriidae [addressed below]) 
and Unionoida. In two of our three trees (Figs. 
3, 4), the Hyriidae is not the sister lineage to the 
Etheriidae (sensu G & C) + Iridinidae + Myc- 
etopodidae clade as portrayed in the preferred 
tree of G & C (Fig. 1). Only MP analysis of G & 
C’s CE matrix containing non-transformed CO/ 
sequences produced a clade containing hyriids 
plus Etheriidae (sensu G & C) + Iridinidae + 
Mycetopodidae (Fig. 5). However, few of the 
estimated interfamilial relationships are sup- 
ported by significant nodal support values in our 
analyses of G & C’s CE matrix (Figs. 3—5). 


DISCUSSION 


Many recent authors view analyses of 
concatenated multi-gene datasets as those 
which produce the best possible estimates 
of phylogeny (e.g., Sanderson et al., 2003; 
Driskell et al., 2004; Gadagkar et al., 2005; de 
Queiroz & Gatesy, 2007). Therefore, it is our 
opinion that future combined evidence analy- 
ses of palaeoheterodont bivalve phylogeny 
should not exclude data unless a compelling 
case can be made for each exclusion. For 
example, previous phylogenetic analyses have 
deleted “extremely long branch” terminals 
because of significant variation in estimated 
terminal branch lengths (e.g., Spears & Abele, 
2000). However, this problem can be better 
addressed by adding taxa to break up long 
branches (Graybeal, 1998), thus eliminating 
the need for discarding terminals. Additionally, 
if particular sequences are objectively found 
to be “inferior” to others (e.g., due to sequenc- 
ing ambiguities such as the presence of many 
uncertain base calls, single nucleotide indels 
in protein coding genes or specimen iden- 


tification/laboratory processing errors), this 
could constitute a valid reason for exclusion. 
However, strict and objective criteria need to be 
employed in making this type of determination 
to guard against potential investigator bias in 
matrix construction (e.g., Hillis, 1998). After all 
is said, “By combining [datasets], the results 
can be interpreted in a broader context, and 
the shortcomings and advantages of each 
partition can be assessed relative to the oth- 
ers.” (G & C, p. 347) still rings true. We think 
that a model paper in this regard is Campbell 
et al. (2005), which presented analyses of a 
three partition data matrix containing all of the 
relevant DNA sequences and subsequently 
discussed potential problems with individual 
sequences (but its figured trees were based on 
analyses that included all relevant sequences). 
It is also our opinion, and that of others (e.g., 
Huelsenbeck & Ronquist, 2001; Lewis, 2001b; 
Ronquist & Huelsenbeck, 2003; Huelsenbeck 
et al., 2004; Lewis et al., 2005; Alfaro & Holder, 
2006), that phylogenetic methodologies that 
attempt to compensate for saturation should be 
employed, in parallel, if an investigator desires 
to use equally weighted MP analyses on non- 
transformed DNA sequences. Again, Campbell 
et al. (2005) is a model paper in that it used 
both BI and equally weighted MP analyses on 
non-transformed DNA sequences. 

Regarding the validity of the Etheriidae (= 
Etheria + Acostaea + Pseudomulleria; sensu 
G & C), Graf's (2000) determination of sister 
taxa status for Acostaea and Etheria, based 
on an analysis of morphological and life history 
characters, is consistent with a monophyletic 
Etheriidae, but Pseudomulleria was not in- 
cluded in that analysis. G & C, whose preferred 
analyses contained a Pseudomulleria terminal 
but only included morphological and life history 
characters in the matrix, confirmed the mono- 
phyly of the Etheriidae. Alternatively, Bogan & 
Hoeh’s (2000) analysis of CO/ DNAsequences, 
that included data for Etheria, Acostaea and 
Pseudomulleria, indicates that Pseudomul- 
leria is a unionid, as does Woodward's (1898) 
anatomical study, thus rejecting the concept 
of a monophyletic Etheriidae. Herein, phy- 
logenetic analyses of the G & C CE dataset 
also produced best BI and strict consensus 
MP trees that statistically rejected etheriid 
(sensu G & C) monophyly (Figs. 3—5). Thus, all 
topologies from the total evidence dataset ana- 
lyzed herein are consistent in that they do not 
support etheriid monophyly but, alternatively, 
Figures 3 and 4 do support Prashad’s (1931) 
and Heard & Hanning’s (1978) independent 
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origins/convergence hypothesis for “etheriid” 
conchology. Regarding the latter, ML-based 
optimization of cemented (= oyster-like) vs. 
non-cemented shell morphology, using the BI 
tree (Fig. 3), indicates three distinct origins of 
cementation in the Unionoida (Fig. 6; as did 
the parsimony-based optimization presented 
in Bogan & Hoeh [2000: fig. 1]). The above 
discussion indicates that G & C’s concept of 
the Etheriidae requires revision. 

Regarding the interfamilial evolutionary re- 
lationships of the Hyriidae, classifications of 
Ortmann (1912, 1921), the morphology-based 
analyses of Graf (2000), Hoeh et al. (2001), 
and Roe & Hoeh (2003), the total evidence- 
based analysis of Roe & Hoeh (2003), as well 
as the preferred CE analysis of G & C (Fig. 1) 
all support the hypothesis of a relatively close 
evolutionary relationship among hyriid and 
etherioid (sensu Bogan & Hoeh, 2000; Hoeh 
et al., 2001) bivalves. Alternatively, the CO/ 
sequence- and total evidence-based analyses 
of Bogan & Hoeh (2000), Hoeh et al. (2001, 
2002) and Walker et al. (2006) indicate that 
the Hyriidae is the basal unionoid bivalve lin- 
eage. Analyses of G & C’s CE dataset, using 
phylogenetic methods that compensate for the 
acknowledged saturation in CO/ sequences 
at this time depth (Hoeh et al., 1998; Bogan 
& Hoeh, 2000; Graf & O Foighil, 2000; Graf & 
Cummings, 2006), produce best BI and strict 
consensus MP trees that support the latter 
hypothesis (Figs. 3, 4). The above discussion 
suggests that G & C’s concept of the Etheri- 
oidea requires revision. Within the confines 
of the analytical methods employed herein, 
which are clearly not exhaustive, the only tree 
to support the Etherioidea sensu G & C (Fig. 5) 
required for its generation an equally weighted 
MP analysis that used non-transformed COJI se- 
quences. We predict that phylogenetic methods 
that correct for saturation in DNA sequences 
(e.g., BI and “transformed” MP analyses) will 
typically yield better estimates of unionoid bi- 
valve phylogeny than will methods that do not 
correct for saturation. However, the generally 
poor nodal support values for most interfamilial 
relationships obtained herein from all analyses 
of G & C’s CE dataset (Figs. 3—5) indicate that 
we cannot objectively choose among interfamil- 
ial relationship hypotheses at this time. Thus, 
robust estimates of unionoid bivalve (1) interfa- 
milial relationships, (2) character evolution, and 
(3) lineage-specific species richness, as well as 
a stable, phylogeny-based classification for the 
group, await appropriate phylogenetic analyses 
utilizing more inclusive datasets. 
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